{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2024-04-03T12:43:56.465857600Z",
     "start_time": "2024-04-03T12:43:44.237123900Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "read time: 0.03900003433227539\n",
      "decompress time: 3.1128597259521484\n",
      "loads time: 3.5241565704345703\n",
      "assembling time: 3.1167263984680176\n",
      "47 lang_1_fixed\n",
      "58 commons-csv\n",
      "61 commons-cli\n",
      "66 gson\n",
      "82 jfreechart\n",
      "=============\n",
      "89 lang_1_fixed\n",
      "98 gson\n",
      "99 commons-csv\n",
      "100 commons-cli\n",
      "101 jfreechart\n",
      "=============\n",
      "123 jfreechart154\n",
      "=============\n",
      "126 jfreechart154\n",
      "127 lang_1_fixed\n",
      "128 gson\n",
      "130 jfreechart154\n",
      "CPU times: total: 12.1 s\n",
      "Wall time: 12.2 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "from app.database import Database\n",
    "from app.classes import IterationStatus, BatchAttemptStatus, TestStatus, AttemptStatus\n",
    "from typing import List\n",
    "\n",
    "db = Database()\n",
    "# batch_li = [x for x in db.get_all() if x.dataset == 'commons-csv']\n",
    "all_dataset = db.get_all()\n",
    "for d in all_dataset:\n",
    "    print(d.doc_id, d.dataset)\n",
    "    if d.doc_id in [82, 101, 123]:\n",
    "        print(\"=============\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8192\n"
     ]
    }
   ],
   "source": [
    "from tqdm import tqdm\n",
    "\n",
    "batch_li = [x for x in all_dataset if x.doc_id in [47, 58, 61, 66, 123]]\n",
    "all_attempts = sum([x.attempts for x in batch_li], [])\n",
    "print(len(all_attempts))"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-03T12:43:56.469858600Z",
     "start_time": "2024-04-03T12:43:56.465857600Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "将这些Java编译错误进行归类需要考虑它们出现的上下文和原因。许多错误可能在某些情况下同时发生，但并不意味着它们总是紧密相关或一个错误的出现必然导致另一个。以下是对这些错误的分析，尝试将它们根据相似的原因或上下文归类：\n",
    "\n",
    "1. **关于类和方法定义的错误**:\n",
    "   - \"Illegal static declaration in inner class\"\n",
    "   - \"cannot inherit from final\"\n",
    "   - \"method does not override or implement a method from a supertype\"\n",
    "   - \"cannot override\"\n",
    "   - \"is not abstract and does not override abstract method\"\n",
    "   - \"enum types must not be local\"\n",
    "   - \"is abstract; cannot be instantiated\"\n",
    "\n",
    "2. **关于变量和类型的错误**:\n",
    "   - \"cannot assign a value to final variable\"\n",
    "   - \"cannot infer type arguments\"\n",
    "   - \"actual and formal argument lists differ in length\"\n",
    "   - \"must be final or effectively final\"\n",
    "   - \"type not allowed here\"\n",
    "   - \"is not within bounds of\"\n",
    "   - \"incompatible types\"\n",
    "   - \"cannot be dereferenced\"\n",
    "\n",
    "3. **关于构造函数和实例化的错误**:\n",
    "   - \"no suitable constructor found\"\n",
    "   - \"qualified new of static class\"\n",
    "   - \"enum types may not be instantiated\"\n",
    "\n",
    "4. **关于访问修饰符和文件结构的错误**:\n",
    "   - \"is public, should be declared in a file named\"\n",
    "\n",
    "5. **关于异常处理的错误**:\n",
    "   - \"unreported exception\"\n",
    "   - \"is never thrown in body of corresponding try statement\"\n",
    "\n",
    "6. **关于循环和迭代的错误**:\n",
    "   - \"for-each not applicable to expression type\"\n",
    "\n",
    "7. **关于方法调用和参数的错误**:\n",
    "   - \"no suitable method found for\"\n",
    "   - \"cannot be applied to given types\"\n",
    "\n",
    "8. **关于语法和表达式的错误**:\n",
    "   - \"cannot find symbol\"\n",
    "   - \"illegal start of expression\"\n",
    "   - \"unclosed character literal\"\n",
    "   - \"illegal escape character\"\n",
    "\n",
    "9. **关于重复定义和命名的错误**:\n",
    "   - \"is already defined in\"\n",
    "\n",
    "10. **关于模糊性和异常的错误**:\n",
    "    - \"is ambiguous\"\n",
    "    - \"has already been caught\"\n",
    "    - \"an enclosing instance that contains\"\n",
    "\n",
    "11. **关于枚举和实例化的错误**:\n",
    "    - \"enum types may not be instantiated\"\n",
    "\n",
    "12. **关于数值和字符字面量的错误**:\n",
    "    - \"integer number too large\"\n",
    "\n",
    "尽管某些错误可能在某些特定情况下同时发生，大多数错误类型都是因为不同的编程错误而独立出现的。因此，它们大多数可以被视为单独的类别，但上述归类尝试根据错误的性质和它们可能共享的上下文将它们分组。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "def get_error_msg(attempt):\n",
    "    import requests\n",
    "    base_url = \"http://127.0.0.1:4200/api/logs/filter\"\n",
    "    payload = {\n",
    "        \"logs\": {\n",
    "            \"level\": {\n",
    "                \"ge_\": 40\n",
    "            },\n",
    "            \"flow_run_id\": {\n",
    "                \"any_\": [\n",
    "                    f\"{attempt.link.split('/')[-1]}\"\n",
    "                ]\n",
    "            }\n",
    "        },\n",
    "        \"sort\": \"TIMESTAMP_ASC\",\n",
    "        \"offset\": 0,\n",
    "        \"limit\": 200\n",
    "    }\n",
    "\n",
    "    # parse response\n",
    "    response = requests.post(base_url, json=payload)\n",
    "    arr = response.json()\n",
    "    messages = [x['message'] for x in arr]\n",
    "    return messages\n",
    "\n",
    "\n",
    "error_types = [\n",
    "    \"cannot find symbol\",\n",
    "    ([\"cannot be accessed from outside package\",\"cannot access\",\"has private access\",\"has protected access\",], \"cannot access\"),\n",
    "\n",
    "    \"is ambiguous\",\n",
    "    \"incompatible types\",\n",
    "    \"unclosed character literal\",\n",
    "    \"illegal escape character\",\n",
    "    \"is not abstract and does not override abstract method\",\n",
    "    \"is already defined in\",  # system error\n",
    "    \"unreported exception\",\n",
    "    \"integer number too large\",  # system error\n",
    "    \"illegal start of expression\",\n",
    "    \"cannot be applied to given types\",\n",
    "    \"enum types must not be local\",\n",
    "    \"cannot be dereferenced\",\n",
    "    \"is abstract; cannot be instantiated\",\n",
    "    \"does not exist\",\n",
    "    \"is never thrown in body of corresponding try statement\",\n",
    "    ([\"reached end of file while parsing\",\"not a statement\",r\"\\';\\' expected\", r\"\\')\\' expected\", r\"')' expected\", r\"'\\' expected\", r\"';' expected\", r\"\\';\\' expected\"],\n",
    "     \"syntax error\"),\n",
    "    \"Illegal static declaration in inner class\",\n",
    "    \"cannot assign a value to final variable\",\n",
    "    \"cannot infer type arguments\",\n",
    "    \"cannot inherit from final\",\n",
    "    \"actual and formal argument lists differ in length\",\n",
    "    ([\"no suitable constructor found\",\"no suitable method found for\",], \"no suitable constructor/method found\"),\n",
    "    \"is public, should be declared in a file named\",\n",
    "    \"qualified new of static class\",\n",
    "    \"method does not override or implement a method from a supertype\",\n",
    "    \"must be final or effectively final\",\n",
    "    \"cannot override\",\n",
    "    \"type not allowed here\",\n",
    "    \"has already been caught\",  # system error\n",
    "    \"an enclosing instance that contains\",\n",
    "    \"enum types may not be instantiated\",\n",
    "    \"is not within bounds of\",\n",
    "    \"for-each not applicable to expression type\",\n",
    "]\n",
    "\n",
    "\n",
    "def get_error_set(msg_list, status):\n",
    "    msg_list = [\n",
    "        x for x in msg_list\n",
    "        if \"[ERROR] COMPILATION ERROR\" in x\n",
    "    ]\n",
    "    error_set = {status}\n",
    "    for msg in msg_list:\n",
    "        flag = False\n",
    "        for error_type in error_types:\n",
    "            if isinstance(error_type, tuple):\n",
    "                for sub_error_type in error_type[0]:\n",
    "                    if sub_error_type in msg:\n",
    "                        flag = True\n",
    "                        error_set.add(error_type[1])\n",
    "            elif error_type in msg:\n",
    "                flag = True\n",
    "                error_set.add(error_type)\n",
    "        if not flag:\n",
    "            print(msg)\n",
    "            assert False\n",
    "    if status == str(IterationStatus.Type.SYNTAX_ERROR):\n",
    "        error_set.add(\"syntax error\")\n",
    "        # if flag:\n",
    "        #     break\n",
    "    return error_set"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T13:20:11.049147600Z",
     "start_time": "2024-04-04T13:20:11.046192900Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 219,
   "outputs": [],
   "source": [
    "count = 0\n",
    "attempts = all_attempts"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T12:59:47.275056800Z",
     "start_time": "2024-04-04T12:59:47.270000600Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 215,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 8192/8192 [01:40<00:00, 81.19it/s] \n"
     ]
    }
   ],
   "source": [
    "from tqdm import tqdm\n",
    "message_list = []\n",
    "for index in tqdm(range(count, len(attempts))):\n",
    "    messages = get_error_msg(attempts[index])\n",
    "    message_list.append(messages)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T12:57:06.631268200Z",
     "start_time": "2024-04-04T12:55:25.719252600Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "outputs": [],
   "source": [
    "# 统计错误类型\n",
    "from collections import Counter\n",
    "\n",
    "error_sets = []\n",
    "for index in range(0, len(message_list)):\n",
    "    attempt = attempts[index]\n",
    "    if attempt.type == IterationStatus.Type.PASS or attempt.type == IterationStatus.Type.RUNTIME_ERROR:\n",
    "        if attempt.index != len(attempt.sub_iteration_status):\n",
    "            for it in attempt.sub_iteration_status[attempt.index:]:\n",
    "                if it.type == IterationStatus.Type.COMPILE_ERROR:\n",
    "                    continue\n",
    "    error_sets.append(get_error_set(message_list[index], str(attempt.type)))\n",
    "    count += 1\n",
    "\n",
    "error_counter = Counter()\n",
    "PASS_error_sets = []\n",
    "COMPILE_ERROR_error_sets = []\n",
    "RUNTIME_ERROR_error_sets = []\n",
    "SYNTAX_ERROR_error_sets = []\n",
    "for error_set in error_sets:\n",
    "    if str(IterationStatus.Type.COMPILE_ERROR) in error_set:\n",
    "        error_set.remove(str(IterationStatus.Type.COMPILE_ERROR))\n",
    "        COMPILE_ERROR_error_sets.append(error_set)\n",
    "    if str(IterationStatus.Type.PASS) in error_set:\n",
    "        error_set.remove(str(IterationStatus.Type.PASS))\n",
    "        PASS_error_sets.append(error_set)\n",
    "    if str(IterationStatus.Type.RUNTIME_ERROR) in error_set:\n",
    "        error_set.remove(str(IterationStatus.Type.RUNTIME_ERROR))\n",
    "        RUNTIME_ERROR_error_sets.append(error_set)\n",
    "    if str(IterationStatus.Type.SYNTAX_ERROR) in error_set:\n",
    "        error_set.remove(str(IterationStatus.Type.SYNTAX_ERROR))\n",
    "        SYNTAX_ERROR_error_sets.append(error_set)\n",
    "    error_counter.update(error_set)\n",
    "\n",
    "\n",
    "# 删除系统错误\n",
    "remove_list = [\n",
    "    \"is already defined in\",\n",
    "    \"integer number too large\",\n",
    "    \"has already been caught\",\n",
    "]\n",
    "while True:\n",
    "    for x in remove_list:\n",
    "        if x in error_counter:\n",
    "            del error_counter[x]\n",
    "            break\n",
    "    else:\n",
    "        break\n",
    "\n",
    "# 将 <50 的类型合并为 \"others\"\n",
    "ec = Counter()\n",
    "for error_set in error_sets:\n",
    "    error_set = [x if error_counter[x] > 50 else \"others\" for x in error_set]\n",
    "    ec.update(error_set)\n",
    "\n",
    "pass_ec = Counter()\n",
    "for error_set in sum([PASS_error_sets,RUNTIME_ERROR_error_sets],[]):\n",
    "    error_set = [x if error_counter[x] > 50 else \"others\" for x in error_set]\n",
    "    pass_ec.update(error_set)\n",
    "\n"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T13:20:19.048389400Z",
     "start_time": "2024-04-04T13:20:18.481731400Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Convert the Counter objects to dictionaries\n",
    "ec_dict = dict(ec)\n",
    "pass_ec_dict = dict(pass_ec)\n",
    "\n",
    "# Merge the two dictionaries into a new dictionary\n",
    "merged_dict = {key: [ec_dict.get(key, 0), pass_ec_dict.get(key, 0), ec_dict.get(key, 0) - pass_ec_dict.get(key, 0)] for key in set(ec_dict) | set(pass_ec_dict)}\n",
    "\n",
    "# Convert the merged dictionary to a pandas DataFrame\n",
    "df = pd.DataFrame.from_dict(merged_dict, orient='index', columns=['ec', 'pass_ec', 'diff'])\n",
    "# Sort the DataFrame by 'ec' in descending order\n",
    "df = df.sort_values(by='ec', ascending=False)\n",
    "\n",
    "# Save the DataFrame to a CSV file\n",
    "df.to_csv('编译错误类型统计.csv', encoding='utf_8_sig')"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T13:20:19.048389400Z",
     "start_time": "2024-04-04T13:20:19.048389400Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "outputs": [],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "#\n",
    "# def pie_counter(counter):\n",
    "#     labels = counter.keys()\n",
    "#     sizes = counter.values()\n",
    "#\n",
    "#     fig1, ax1 = plt.subplots(dpi=300)\n",
    "#     ax1.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)\n",
    "#     ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.\n",
    "#     plt.savefig(\"error_type_pie.png\")\n",
    "#     plt.show()\n",
    "# pie_counter(ec)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-03T13:17:40.784367100Z",
     "start_time": "2024-04-03T13:17:40.769070700Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 8192/8192 [02:59<00:00, 45.63it/s]\n"
     ]
    }
   ],
   "source": [
    "# 运行错误统计\n",
    "# 类型一： 模版修复成功。找auto_fix=True, try to fix 1 failures and 0 errors!\n",
    "# 类型二： 模版修复失败。\n",
    "# 类型三： 大模型修复成功。 token >= 2\n",
    "# 类型四： 大模型修复失败。\n",
    "def get_all_msg(attempt):\n",
    "    import requests\n",
    "    base_url = \"http://127.0.0.1:4200/api/logs/filter\"\n",
    "    payload = {\n",
    "        \"logs\": {\n",
    "            \"level\": {\n",
    "                \"ge_\": 20\n",
    "            },\n",
    "            \"flow_run_id\": {\n",
    "                \"any_\": [\n",
    "                    f\"{attempt.link.split('/')[-1]}\"\n",
    "                ]\n",
    "            }\n",
    "        },\n",
    "        \"sort\": \"TIMESTAMP_ASC\",\n",
    "        \"offset\": 0,\n",
    "        \"limit\": 200\n",
    "    }\n",
    "\n",
    "    # parse response\n",
    "    response = requests.post(base_url, json=payload)\n",
    "    arr = response.json()\n",
    "    messages = [x['message'] for x in arr if 40 > x['level'] >= 20]\n",
    "    return messages\n",
    "\n",
    "from tqdm import tqdm\n",
    "all_message_list = []\n",
    "for index in tqdm(range(0, len(attempts))):\n",
    "    messages = get_all_msg(attempts[index])\n",
    "    all_message_list.append(messages)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T02:28:42.559522100Z",
     "start_time": "2024-04-04T02:25:42.997736400Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 209,
   "outputs": [],
   "source": [
    "failure_types = [\n",
    "    \"org.junit.Assert.assertEquals\",\n",
    "    ([\"org.junit.Assert.assertTrue\", \"org.junit.Assert.assertFalse\"], \"assertTrue/False\"),\n",
    "    ([\"org.junit.Assert.assertNotNull\", \"org.junit.Assert.assertNull\"], \"assertNull/NotNull\"),\n",
    "    ([\"org.mockito.exceptions\",\"Argument(s) are different!\",\"Wanted but not invoked\"], \"mockito\"),\n",
    "    \"org.junit.Assert.assertThat\",\n",
    "]\n",
    "\n",
    "\n",
    "def get_failure_set(msg_list, status):\n",
    "    msg_list = [\n",
    "        x for x in msg_list\n",
    "        if \"[INFO]  T E S T S\" in x\n",
    "    ]\n",
    "    pattern = r\"Tests run: (\\d+), Failures: (\\d+), Errors: (\\d+)\"\n",
    "    import re\n",
    "    def get_exec_tuple(message: str):\n",
    "        match = re.search(pattern, message)\n",
    "        if match:\n",
    "            return tuple(map(int, match.groups()))\n",
    "        return (0, 0, 0)\n",
    "    failure_set = {status}\n",
    "    for msg in msg_list:\n",
    "        t = get_exec_tuple(msg)\n",
    "        if t[1] + t[2] == 0:\n",
    "            continue\n",
    "        flag = False\n",
    "        if t[2] > 0:\n",
    "            failure_set.add(\"error\")\n",
    "            continue\n",
    "        for failure_type in failure_types:\n",
    "            if isinstance(failure_type, tuple):\n",
    "                for sub_failure_type in failure_type[0]:\n",
    "                    if sub_failure_type in msg:\n",
    "                        flag = True\n",
    "                        failure_set.add(sub_failure_type)\n",
    "            elif failure_type in msg:\n",
    "                flag = True\n",
    "                failure_set.add(failure_type)\n",
    "        if not flag:\n",
    "            failure_set.add(\"fail\" if \"org.junit.Assert.fail\" in msg else \"others\")\n",
    "\n",
    "\n",
    "    return failure_set"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T06:29:50.732631900Z",
     "start_time": "2024-04-04T06:29:50.730625600Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 8192/8192 [00:00<00:00, 26757.19it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "126\n",
      "Counter({'org.junit.Assert.assertEquals': 1360, 'error': 918, 'org.junit.Assert.assertTrue': 436, 'fail': 206, 'org.junit.Assert.assertFalse': 178, 'org.junit.Assert.assertNotNull': 117, 'org.junit.Assert.assertNull': 108, 'Wanted but not invoked': 30, 'others': 18, 'Argument(s) are different!': 7, 'org.mockito.exceptions': 6, 'org.junit.Assert.assertThat': 2})\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": "Counter({'org.junit.Assert.assertEquals': 1360,\n         'error': 918,\n         'org.junit.Assert.assertTrue': 436,\n         'fail': 206,\n         'org.junit.Assert.assertFalse': 178,\n         'org.junit.Assert.assertNotNull': 117,\n         'org.junit.Assert.assertNull': 108,\n         'others': 63})"
     },
     "execution_count": 210,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 统计错误类型\n",
    "from collections import Counter\n",
    "\n",
    "failure_sets = []\n",
    "skip = 0\n",
    "for index in tqdm(range(0, len(all_message_list))):\n",
    "    attempt = attempts[index]\n",
    "    if attempt.type == IterationStatus.Type.PASS:\n",
    "        if attempt.index != len(attempt.sub_iteration_status):\n",
    "            for it in attempt.sub_iteration_status[attempt.index:]:\n",
    "                if it.type == IterationStatus.Type.RUNTIME_ERROR:\n",
    "                    skip += 1\n",
    "                    continue\n",
    "    failure_sets.append(get_failure_set(all_message_list[index], str(attempt.type)))\n",
    "    count += 1\n",
    "print(skip)\n",
    "failure_counter = Counter()\n",
    "PASS_failure_sets = []\n",
    "COMPILE_ERROR_failure_sets = []\n",
    "RUNTIME_ERROR_failure_sets = []\n",
    "SYNTAX_ERROR_failure_sets = []\n",
    "filtered_failure_sets = []\n",
    "for failure_set in failure_sets:\n",
    "    if str(IterationStatus.Type.COMPILE_ERROR) in failure_set:\n",
    "        continue\n",
    "        failure_set.remove(str(IterationStatus.Type.COMPILE_ERROR))\n",
    "        COMPILE_ERROR_failure_sets.append(failure_set)\n",
    "    if str(IterationStatus.Type.PASS) in failure_set:\n",
    "        failure_set.remove(str(IterationStatus.Type.PASS))\n",
    "        PASS_failure_sets.append(failure_set)\n",
    "    if str(IterationStatus.Type.RUNTIME_ERROR) in failure_set:\n",
    "        failure_set.remove(str(IterationStatus.Type.RUNTIME_ERROR))\n",
    "        RUNTIME_ERROR_failure_sets.append(failure_set)\n",
    "    if str(IterationStatus.Type.SYNTAX_ERROR) in failure_set:\n",
    "        continue\n",
    "        failure_set.remove(str(IterationStatus.Type.SYNTAX_ERROR))\n",
    "        SYNTAX_ERROR_failure_sets.append(failure_set)\n",
    "        continue\n",
    "    if str(IterationStatus.Type.FAIL) in failure_set:\n",
    "        continue\n",
    "    filtered_failure_sets.append(failure_set)\n",
    "    failure_counter.update(failure_set)\n",
    "\n",
    "\n",
    "# 删除系统错误\n",
    "remove_list = [\n",
    "    \"is already defined in\",\n",
    "    \"integer number too large\",\n",
    "    \"has already been caught\",\n",
    "]\n",
    "while True:\n",
    "    for x in remove_list:\n",
    "        if x in failure_counter:\n",
    "            del failure_counter[x]\n",
    "            break\n",
    "    else:\n",
    "        break\n",
    "\n",
    "# 将 <50 的类型合并为 \"others\"\n",
    "ec = Counter()\n",
    "for failure_set in filtered_failure_sets:\n",
    "    failure_set = [x if failure_counter[x] > 50 else \"others\" for x in failure_set]\n",
    "    ec.update(failure_set)\n",
    "\n",
    "pass_ec = Counter()\n",
    "for failure_set in PASS_failure_sets:\n",
    "    failure_set = [x if failure_counter[x] > 50 else \"others\" for x in failure_set]\n",
    "    pass_ec.update(failure_set)\n",
    "\n",
    "print(failure_counter)\n",
    "ec"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T06:29:57.727650400Z",
     "start_time": "2024-04-04T06:29:57.322352700Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 211,
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Convert the Counter objects to dictionaries\n",
    "ec_dict = dict(ec)\n",
    "pass_ec_dict = dict(pass_ec)\n",
    "\n",
    "# Merge the two dictionaries into a new dictionary\n",
    "merged_dict = {key: [ec_dict.get(key, 0), pass_ec_dict.get(key, 0), ec_dict.get(key, 0) - pass_ec_dict.get(key, 0)] for key in set(ec_dict) | set(pass_ec_dict)}\n",
    "\n",
    "# Convert the merged dictionary to a pandas DataFrame\n",
    "df = pd.DataFrame.from_dict(merged_dict, orient='index', columns=['ec', 'pass_ec', 'diff'])\n",
    "# Sort the DataFrame by 'ec' in descending order\n",
    "df = df.sort_values(by='ec', ascending=False)\n",
    "\n",
    "# Save the DataFrame to a CSV file\n",
    "df.to_csv('运行错误类型统计.csv', encoding='utf_8_sig')"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-04-04T06:30:10.485647700Z",
     "start_time": "2024-04-04T06:30:10.480907Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
